25 research outputs found

    EDSC: Efficient document subspace clustering technique for high-dimensional data

    Get PDF
    With the advancement in the pervasive technology, there is a spontaneous rise in the size of the data. Such data are generated from various forms of resources right from individual to organization level. Due to the characteristics of unstructured or semi-structuredness in data representation, the existing data analytics approaches are not directly applicable which leads to curse of dimensionality problem. Hence, this paper presents an Efficient Document Subspace Clustering (EDSC) technique for high-dimensional data that contributes to the existing system with respect to identification by eliminating the redundant data. The discrete segmentation of data points are used to explicitly expose the dimensionality of hidden subspaces in the clusters. The outcome of the proposed system was compared with existing system to find the effective document clustering process for high-dimensional data. The processing time of EDSC for subspace clustering is reduced by 50% as compared to the existing system

    RMSC: Robust Modeling of Subspace Clustering for high dimensional data

    Get PDF
    Subspace clustering is one of the active research problem associated with high-dimensional data. Here some of the standard techniques are reviewed to investigate existing methodologies. Although, there have been various forms of research techniques evolved recently, they do not completely mitigate the problems pertaining to noise sustainability and optimization of clustering accuracy. Hence, a novel technique called as Robust Modeling of Subspace Clustering (RMSC) presented to solve the above problem. An analytical research methodology is used to formulate two algorithms for computing outliers and for extracting elite subspace from the highdimensional data inflicted by different forms of noise. RMSC was found to offer higher accuracy and lower error rate both in presence of noise and absence of noise over high-dimensional data. © 2017 IEEE

    Onto Collab: Strategic review oriented collaborative knowledge modeling using ontologies

    Get PDF
    Modeling efficient knowledge bases for improving the semantic property of the World Wide Web is mandatory for promoting innovations and developments in World Wide Web. There is a need for efficient and organized modeling of the knowledge bases. In this paper, a strategy Onto Collab is proposed for construction of knowledge bases using ontology modeling. Ontologies are visualized as the basic building blocks of the knowledge in the web. The cognitive bridge between the human conceptual understanding of real world data and the processable data by computing systems is represented by Ontologies. A domain is visualized as a collection of similar ontologies. A review based strategy is proposed over a secure messaging system to author ontologies and a platform for retracing the domain ontologies as individuals and as a team is proposed. Evaluations for ontologies constructed pertaining to a domain for non-wiki knowledge bases is carried out

    Improving Response Time and Though put of Search Engine with Web Caching

    Get PDF
    Large web search engines need to be able to process thousands of queries per second on collections of billions of web pages. As a result, query processing is a major performance bottleneck and cost factor in current search engines, and a number of techniques are employed to increase query throughput, including massively parallel processing, index compression, early termination, and caching. Caching is a useful technique for Web systems that are accessed by a large number of users. It enables a shorter average response time, it reduces the workload on back-end servers, and it reduces the overall amount of utilized bandwidth. Our contribution in this paper can be split into two parts. In the first part, we proposed Cached Search Algorithm (CSA) on top of the multiple search engines like Google, Yahoo and Bing and achieved the better response time while accessing the resulting web pages. In the second part, we design and implemented the Cached Search Engine and the performance evaluated based on the training data (WEPS dataset [1]) and the test data (Mobile dataset). The Cached Search outperforms the better by reducing the response time of search engine and to increase response throughput of the searched results

    Web Page Recommendation System using Self Organizing Map Technique

    Get PDF
    The exponential explosion of various contents on the Web, made Recommendation Systems increasingly indispensable. Innumerable different kinds of recommendations are made on the Web every day, including movies, music, images, books recommendations, query suggestions and tags recommendations, etc. The paper aims to provide the users with most relevant results (URL’s) to the respective query word. The developed System uses the K-means technique and the Modified System uses the Self-Organizing Map technique. Both the methods use historical browsers data for search key words and provide users with most relevant web pages. All users click-through activity such as number of times he visited, duration he spent, and several other variables are stored in database. The Systems use this database and process to cluster and rank them. The results obtained shows that the Self Organizing Map technique produce most relevant results for a particular query word compared to K-means technique. The Self Organizing Map technique is the optimal method for Web Page recommendations. The Modified System can be utilized in many recommendation tasks on the World Wide Web, including expert finding, image recommendations, image annotations, etc. The experimental results show the promising future of our work

    Web search engine based semantic similarity measure between words using pattern retrieval algorithm

    Get PDF
    Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO) support vector machines (SVM) to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8% of correlation value

    A Hybridized Framework for Ontology Modeling incorporating Latent Semantic Analysis and Content based Filtering

    Get PDF
    In the era of Semantic Web, organization of the necessary Semantic Information becomes quite vital for improving overall retrieval efficiency of the Semantic Web contents. Ontologies are one of the most important and yet the most primary entities of the semantic web which is used for representing and modeling knowledge. Authoring of ontologies must be done in a highly systematic and an organized manner in order to validate the correctness of the ontologies authored. Several traditional ontology authoring systems are based on Semantic Wikis which use graphs to store the ontological entities that increase the overall complexity of ontologies which needs to be overcome. A Hash Table based ontology organization strategy is proposed which is further empowered by a Semantic Latent Analysis to compute the ontological relevance. Several agents are incorporated to check the correctness of ontologies. The proposed framework is further enhanced with Content Based Filtering for yielding better results. The proposed methodology yields an accuracy percentage of 88.9

    PWIS: Personalized Web Image Search using One-Click Method

    Get PDF
    Personalized Web Image search is the one searching for the particular images of User intention on the Web. For searching images, a user might provide query terms like keyword, image file, or click on few image file, and therefore the system can determine the images similar to the query. The similarity used for search criteria could be Meta tags, color distribution in images, region/shapes attributes, etc. Web-scale image search engines namely Google and Bing searches for images are relying on the surrounding text features. It is highly cumbersome and complicated for the web-scale based image search engines to interpret users search intention only by querying of keywords. This leads to the incorporation of noise and high ambiguity in the search results which are extremely unfit in the context of the users. It's also a necessary mandate for using visual information for solving the problem of ambiguity in the text-based image retrieval scenario. In the case of Google search, search text box will auto complete while user is typing similar added keywords. This method will differ from user intention while searching. So to avoid this kind of faults, it is important to use visual information in order to solve the uncertainty in text-based image retrieval. To retrieve exact matching, and acquire user‟ s intention we can allow them text query with extended or related images as a suggestion. We have proposed an innovative Web image search approach. It only needs the user to click on one query image with minimal effort and images from a pool fetched by text-based search are re-ranked based on both visual and textual contents

    Enhanced neighborhood normalized pointwise mutual information algorithm for constraint aware data clustering

    Get PDF
    Clustering of similar data items is an important technique in mining useful patterns. To enhance the performance of Clustering, training or learning is an important task. A constraint learning semi-supervised methodology is proposed which incorporates SVM and Normalized Point wise Mutual Information Computation Strategy to increase the relevance as well as the performance efficiency of clustering. The SVM Classifier is of Hard Margin Type to roughly classify the initial set. A recursive re-clustering approach is proposed for achieving higher degree of relevance in the final clustered set by incorporating ENNPI algorithm. An overall enriched F-Measure value of 94.09% is achieved as compared to existing algorithms

    S 3 DCE: Secure Storing and Sharing of Data in Cloud Environment using user Phrase

    Get PDF
    Distributed Cloud Environment (DCE) focuses mainly on securing the data and safely shares it to the user. Data leakage may occur by the channel compromising or with the key managers. It is necessary to safeguard the communication channel between the entities before sharing the data. In this process of sharing, what if the key managers compromises with intruders and reveal the information of the user’s key that is used for encryption. The process of securing the key by using the user’s phrase is the key concept used in the proposed system “Secure Storing and Sharing of Data in Cloud Environment using User Phrase (S3DCE). It does not rely on any key managers to generate the key instead; the user himself generates the key. In order to provide double security, the public key derived from the user’s phrase also encrypts the encryption key. S3DCE guarantees privacy, confidentiality and integrity of the user data while storing and sharing. The proposed method S3DCE is more efficient in terms of time, cost and resource utilization compared to the existing algorithm DaSCE (Data Security for Cloud Environment with Semi Trusted Third Party) [22] and DACESM (Data Security for Cloud Environment with Scheduled Key Managers) [23]
    corecore